For more details, datasets, and analysis scripts, visit our GitHub webpage.
Teamwork Approach:
Execution:
This strategy ensured our project's success, reflecting our coordinated understanding and effort.
The QS World University Rankings are a globally recognized framework for evaluating higher education institutions. This project will analyze ranking trends from 2022 to 2024 to uncover patterns and determinants of university performance. The findings will serve as an empirical guide for stakeholders in the education sector.
QS World University Rankings The QS World University Rankings provide a comprehensive evaluation of over 1,000 higher education institutions globally. Sourced from Quacquarelli Symonds (QS), these rankings are recognized worldwide for their depth of research and breadth of data regarding university performance. The datasets for 2022, 2023, and 2024, accessible through the QS website, form the primary basis of our analysis. These tables offer detailed insights into various performance metrics such as academic reputation, employer reputation, faculty-student ratio, citations per faculty, international faculty, and international students scores. By analyzing these datasets, we aim to uncover trends, evaluate shifts in rankings, and identify the determinants of university performance across the specified years.
The QS ranking methodology utilizes several metrics to gauge university performance, each capturing a distinct aspect of university excellence:
Academic Reputation Score (40% weight): Derived from a global academic survey, this score reflects the perceived research quality and academic standing of an institution.
Employer Reputation Score (10% weight): Based on a survey of employers, this score indicates the employability and preparedness of graduates in the workforce.
Faculty Student Score (20% weight): This metric measures the faculty-to-student ratio, providing insight into the teaching and learning environment of the university.
Citations per Faculty Score (20% weight): A measure of research impact, this score is calculated based on the average citations per faculty member, indicating research influence and quality.
International Faculty Score (5% weight): This score assesses the diversity of the faculty by measuring the proportion of international faculty members at the institution.
International Students Score (5% weight): Similarly, this score evaluates the diversity of the student body by looking at the percentage of international students.
Overall Score: A composite score that combines all individual metrics, representing a summarized assessment of a university's overall ranking performance.
import pandas as pd
import os
import sys
import matplotlib.pyplot as plt
import seaborn as sns
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
!git clone https://github.com/weike2001/ds
Cloning into 'ds'... remote: Enumerating objects: 35, done. remote: Counting objects: 100% (35/35), done. remote: Compressing objects: 100% (31/31), done. remote: Total 35 (delta 5), reused 0 (delta 0), pack-reused 0 Receiving objects: 100% (35/35), 4.55 MiB | 6.64 MiB/s, done. Resolving deltas: 100% (5/5), done.
import pandas as pd
# Set the paths to the Excel files in the cloned repository
file_path_2022 = '/content/ds/data/2022_QS_World_University_Rankings_Results_public_version.xlsx'
file_path_2023 = '/content/ds/data/2023 QS World University Rankings V2.1 (For qs.com).xlsx'
file_path_2024 = '/content/ds/data/2024 QS World University Rankings 1.2 (For qs.com).xlsx'
# Read the data into pandas DataFrames
df_2022 = pd.read_excel(file_path_2022)
df_2023 = pd.read_excel(file_path_2023)
df_2024 = pd.read_excel(file_path_2024)
# Assuming you want to save these DataFrames as CSV files in the same directory
csv_file_path_2022 = file_path_2022.replace('.xlsx', '.csv')
csv_file_path_2023 = file_path_2023.replace('.xlsx', '.csv')
csv_file_path_2024 = file_path_2024.replace('.xlsx', '.csv')
# Save the DataFrames as CSV files
df_2022.to_csv(csv_file_path_2022, index=False)
df_2023.to_csv(csv_file_path_2023, index=False)
df_2024.to_csv(csv_file_path_2024, index=False)
Adjust columns in each csv form
import pandas as pd
# Define the new specific column names
specific_column_names_2022 = [
'National Rank', 'Regional Rank', '2022 Rank', '2021 Rank', 'Institution Name',
'Location Code', 'Country/Territory', 'Size', 'Focus', 'Research Intensity',
'Age Band', 'Status', 'Academic Reputation Score', 'Academic Reputation Rank',
'Employer Reputation Score', 'Employer Reputation Rank', 'Faculty Student Score',
'Faculty Student Rank', 'Citations per Faculty Score', 'Citations per Faculty Rank',
'International Faculty Score', 'International Faculty Rank', 'International Students Score',
'International Students Rank', 'Overall Score'
]
specific_column_names_2023 = [
'2023 Rank', '2022 Rank', 'Institution Name', 'Location Code', 'Country/Territory',
'Size', 'Focus', 'Research Intensity', 'Age Band', 'Status',
'Academic Reputation Score', 'Academic Reputation Rank',
'Employer Reputation Score', 'Employer Reputation Rank',
'Faculty Student Score', 'Faculty Student Rank',
'Citations per Faculty Score', 'Citations per Faculty Rank',
'International Faculty Score', 'International Faculty Rank',
'International Students Score', 'International Students Rank',
'International Research Network Score', 'International Research Network Rank',
'Employment Outcomes Score', 'Employment Outcomes Rank',
'Overall Score'
]
specific_column_names_2024 = [
'2024 Rank', '2023 Rank', 'Institution Name', 'Location Code', 'Country/Territory',
'Size', 'Focus', 'Research Intensity', 'Status',
'Academic Reputation Score', 'Academic Reputation Rank',
'Employer Reputation Score', 'Employer Reputation Rank',
'Faculty Student Score', 'Faculty Student Rank',
'Citations per Faculty Score', 'Citations per Faculty Rank',
'International Faculty Score', 'International Faculty Rank',
'International Students Score', 'International Students Rank',
'International Research Network Score', 'International Research Network Rank',
'Employment Outcomes Score', 'Employment Outcomes Rank',
'Sustainability Score', 'Sustainability Rank',
'Overall Score'
]
print(len(specific_column_names_2024))
# Reading the CSV files into Pandas DataFrames
df_2022 = pd.read_csv(csv_file_path_2022, skiprows = 4, names=specific_column_names_2022)
df_2023 = pd.read_csv(csv_file_path_2023, skiprows = 4, names=specific_column_names_2023)
df_2024 = pd.read_csv(csv_file_path_2024, skiprows = 4, names=specific_column_names_2024)
df_2022.head()
28
| National Rank | Regional Rank | 2022 Rank | 2021 Rank | Institution Name | Location Code | Country/Territory | Size | Focus | Research Intensity | ... | Employer Reputation Rank | Faculty Student Score | Faculty Student Rank | Citations per Faculty Score | Citations per Faculty Rank | International Faculty Score | International Faculty Rank | International Students Score | International Students Rank | Overall Score | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 1 | 1 | 1 | Massachusetts Institute of Technology (MIT) | US | United States | M | CO | VH | ... | 4 | 100.0 | 12 | 100.0 | 6 | 100.0 | 45 | 91.4 | 105 | 100 |
| 1 | 1 | 1 | 2 | 5 | University of Oxford | UK | United Kingdom | L | FC | VH | ... | 3 | 100.0 | 5 | 96.0 | 34 | 99.5 | 83 | 98.5 | 52 | 99.5 |
| 2 | 2 | 2 | 3= | 2 | Stanford University | US | United States | L | FC | VH | ... | 5 | 100.0 | 9 | 99.9 | 10 | 99.8 | 73 | 67.0 | 208 | 98.7 |
| 3 | 2 | 2 | 3= | 7 | University of Cambridge | UK | United Kingdom | L | FC | VH | ... | 2 | 100.0 | 10 | 92.1 | 48 | 100.0 | 57 | 97.7 | 64 | 98.7 |
| 4 | 3 | 3 | 5 | 3 | Harvard University | US | United States | L | FC | VH | ... | 1 | 99.1 | 37 | 100.0 | 3 | 84.2 | 188 | 70.1 | 196 | 98 |
5 rows × 25 columns
df_2023.head()
| 2023 Rank | 2022 Rank | Institution Name | Location Code | Country/Territory | Size | Focus | Research Intensity | Age Band | Status | ... | Citations per Faculty Rank | International Faculty Score | International Faculty Rank | International Students Score | International Students Rank | International Research Network Score | International Research Network Rank | Employment Outcomes Score | Employment Outcomes Rank | Overall Score | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 1 | Massachusetts Institute of Technology (MIT) | US | United States | M | CO | VH | 5.0 | B | ... | 5 | 100.0 | 54 | 90.0 | 109 | 96.1 | 58 | 100.0 | 3 | 100 |
| 1 | 2 | 3= | University of Cambridge | UK | United Kingdom | L | FC | VH | 5.0 | A | ... | 55 | 100.0 | 60 | 96.3 | 70 | 99.5 | 6 | 100.0 | 9 | 98.8 |
| 2 | 3 | 3= | Stanford University | US | United States | L | FC | VH | 5.0 | B | ... | 9 | 99.8 | 74 | 60.3 | 235 | 96.3 | 55 | 100.0 | 2 | 98.5 |
| 3 | 4 | 2 | University of Oxford | UK | United Kingdom | L | FC | VH | 5.0 | A | ... | 64 | 98.8 | 101 | 98.4 | 54 | 99.9 | 3 | 100.0 | 7 | 98.4 |
| 4 | 5 | 5 | Harvard University | US | United States | L | FC | VH | 5.0 | B | ... | 2 | 76.9 | 228 | 66.9 | 212 | 100.0 | 1 | 100.0 | 1 | 97.6 |
5 rows × 27 columns
df_2024.head()
| 2024 Rank | 2023 Rank | Institution Name | Location Code | Country/Territory | Size | Focus | Research Intensity | Status | Academic Reputation Score | ... | International Faculty Rank | International Students Score | International Students Rank | International Research Network Score | International Research Network Rank | Employment Outcomes Score | Employment Outcomes Rank | Sustainability Score | Sustainability Rank | Overall Score | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 1 | Massachusetts Institute of Technology (MIT) | US | United States | M | CO | VH | B | 100.0 | ... | 56 | 88.2 | 128 | 94.3 | 58 | 100.0 | 4 | 95.2 | 51 | 100 |
| 1 | 2 | 2 | University of Cambridge | UK | United Kingdom | L | FC | VH | A | 100.0 | ... | 64 | 95.8 | 85 | 99.9 | 7 | 100.0 | 6 | 97.3 | 33= | 99.2 |
| 2 | 3 | 4 | University of Oxford | UK | United Kingdom | L | FC | VH | A | 100.0 | ... | 110 | 98.2 | 60 | 100.0 | 1 | 100.0 | 3 | 97.8 | 26= | 98.9 |
| 3 | 4 | 5 | Harvard University | US | United States | L | FC | VH | B | 100.0 | ... | 210 | 66.8 | 223 | 100.0 | 5 | 100.0 | 1 | 96.7 | 39 | 98.3 |
| 4 | 5 | 3 | Stanford University | US | United States | L | FC | VH | B | 100.0 | ... | 78 | 51.2 | 284 | 95.8 | 44 | 100.0 | 2 | 94.4 | 63 | 98.1 |
5 rows × 28 columns
In this section, we focus on preparing the 'Overall Score' data from the QS World University Rankings for 2022, 2023, and 2024. The preparation involves two key steps:
NaN (Not a Number) to standardize the dataset for numerical analysis.Objectives:
This data preparation is essential for analyzing global university ranking trends and setting the stage for further in-depth examination of university performances.
import pandas as pd
import numpy as np
# Replace hyphens with NaN and convert the column to numeric
df_2022['Overall Score'] = pd.to_numeric(df_2022['Overall Score'].replace('-', np.nan), errors='coerce')
df_2023['Overall Score'] = pd.to_numeric(df_2023['Overall Score'].replace('-', np.nan), errors='coerce')
df_2024['Overall Score'] = pd.to_numeric(df_2024['Overall Score'].replace('-', np.nan), errors='coerce')
# Now, 'Overall Score' will be a float column with NaNs where there were hyphens - .
In our analysis of the QS World University Rankings datasets spanning 2022 to 2024, we direct our attention to a curated selection of metrics that significantly influence a university's prestige and global ranking. The evaluation encompasses:
For these pivotal metrics, we compute the mean, standard deviation, median, minimum, and maximum values to provide a distilled overview of university performance. This analysis will shed light on the average achievements, consistency, and range within these critical areas, offering stakeholders a succinct and strategic insight into the dynamics shaping university rankings.
import pandas as pd
df_2022.describe()
| Age Band | Academic Reputation Score | Employer Reputation Score | Faculty Student Score | Citations per Faculty Score | International Faculty Score | International Students Score | Overall Score | |
|---|---|---|---|---|---|---|---|---|
| count | 1300.000000 | 1300.000000 | 1300.000000 | 1299.000000 | 1300.000000 | 1228.000000 | 1275.000000 | 501.000000 |
| mean | 4.011538 | 21.552462 | 22.193000 | 31.907313 | 26.293308 | 26.503746 | 28.119059 | 44.767066 |
| std | 0.988318 | 23.315627 | 24.535947 | 28.564402 | 28.299027 | 35.429502 | 31.211629 | 18.961269 |
| min | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 24.100000 |
| 25% | 3.000000 | 6.200000 | 5.100000 | 9.400000 | 3.400000 | 1.700000 | 3.750000 | 29.600000 |
| 50% | 4.000000 | 11.900000 | 11.950000 | 20.600000 | 13.400000 | 5.400000 | 13.200000 | 38.600000 |
| 75% | 5.000000 | 25.925000 | 29.625000 | 47.950000 | 43.400000 | 44.425000 | 44.450000 | 55.400000 |
| max | 5.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 |
df_2023.describe()
| Age Band | Academic Reputation Score | Employer Reputation Score | Faculty Student Score | Citations per Faculty Score | International Faculty Score | International Students Score | International Research Network Score | Employment Outcomes Score | Overall Score | |
|---|---|---|---|---|---|---|---|---|---|---|
| count | 1411.000000 | 1422.000000 | 1421.000000 | 1420.000000 | 1417.000000 | 1324.000000 | 1365.000000 | 1409.000000 | 1410.000000 | 500.000000 |
| mean | 4.008505 | 20.124684 | 20.657143 | 29.997113 | 24.529358 | 31.659517 | 26.545348 | 49.570121 | 26.186809 | 44.619400 |
| std | 0.965320 | 22.802706 | 24.027928 | 28.172207 | 27.910952 | 34.170817 | 30.896854 | 30.205439 | 26.201036 | 18.655057 |
| min | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 24.200000 |
| 25% | 3.000000 | 5.400000 | 4.400000 | 8.200000 | 3.100000 | 4.800000 | 3.300000 | 21.600000 | 6.700000 | 29.800000 |
| 50% | 4.000000 | 10.800000 | 10.300000 | 18.250000 | 11.100000 | 13.750000 | 10.800000 | 47.700000 | 15.500000 | 38.550000 |
| 75% | 5.000000 | 23.775000 | 27.000000 | 43.500000 | 39.400000 | 55.075000 | 40.500000 | 77.600000 | 36.900000 | 54.500000 |
| max | 5.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 |
df_2024.describe()
| Academic Reputation Score | Employer Reputation Score | Faculty Student Score | Citations per Faculty Score | International Faculty Score | International Students Score | International Research Network Score | Employment Outcomes Score | Sustainability Score | Overall Score | |
|---|---|---|---|---|---|---|---|---|---|---|
| count | 1498.000000 | 1497.000000 | 1474.000000 | 1474.000000 | 1372.000000 | 1418.000000 | 1494.000000 | 1474.000000 | 1398.000000 | 602.000000 |
| mean | 20.132043 | 19.806880 | 28.643894 | 23.940163 | 30.948834 | 25.575035 | 23.967938 | 20.016961 | 25.412017 | 40.879900 |
| std | 22.365895 | 23.764625 | 27.843868 | 28.075573 | 34.247562 | 30.867149 | 30.371277 | 20.241410 | 31.010557 | 19.181335 |
| min | 1.600000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 19.800000 |
| 25% | 6.000000 | 4.100000 | 7.500000 | 2.800000 | 4.300000 | 3.000000 | 1.200000 | 8.225000 | 1.400000 | 25.700000 |
| 50% | 10.900000 | 9.500000 | 16.750000 | 10.400000 | 13.050000 | 9.850000 | 6.850000 | 11.700000 | 8.400000 | 34.550000 |
| 75% | 23.100000 | 25.500000 | 41.900000 | 37.900000 | 52.725000 | 38.075000 | 40.375000 | 22.475000 | 42.525000 | 51.300000 |
| max | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 |
This section is dedicated to a comprehensive examination of the QS World University Rankings' metrics. We aim to dissect each component of the ranking system to provide an intricate understanding of how universities are evaluated and ranked on the global stage.
The QS ranking framework employs a set of multifaceted metrics, each designed to quantify distinct aspects of university performance. These metrics are:
The Overall Score represents a consolidated assessment derived from these individual metrics, dictating the university's ranking.
Through this deep dive into the QS ranking metrics, we seek to elucidate the nuances that underpin university rankings, providing a clear guide for institutions aiming to enhance their global standing.
import matplotlib.pyplot as plt
import seaborn as sns
qs_metrics_weights = {
'Academic Reputation Score': {"weight": 0.40},
'Employer Reputation Score': {"weight": 0.10},
'Faculty Student Score': {"weight": 0.20},
'Citations per Faculty Score': {"weight": 0.20},
'International Faculty Score': {"weight": 0.05},
'International Students Score': {"weight": 0.05},
}
def create_grid_layout_without_definitions(df, metrics_info, year):
# Set up the figure with subplots
fig, axes = plt.subplots(2, 3, figsize=(20, 10)) # Adjust figure size as needed
axes = axes.ravel()
palette = sns.color_palette("coolwarm", len(metrics_info))
# Plot each metric in the grid
for ax, (metric, info), color in zip(axes, metrics_info.items(), palette):
weight = info['weight']
sns.histplot(df[metric], kde=True, ax=ax, color=color, alpha=0.7, linewidth=0.5)
ax.set_title(f"{metric} ({weight*100}%)", fontsize=10)
ax.set_xlabel('Score', fontsize=9)
# Add a main title and adjust layout
plt.suptitle(f'Distribution of QS Ranking Metrics for {year}', fontsize=16)
plt.tight_layout(rect=[0, 0.03, 1, 0.95]) # Adjust the layout
plt.show()
# Example usage with the 2022 dataset
create_grid_layout_without_definitions(df_2022, qs_metrics_weights, '2022')
create_grid_layout_without_definitions(df_2023, qs_metrics_weights, '2023')
create_grid_layout_without_definitions(df_2024, qs_metrics_weights, '2024')
The QS World University Rankings across 2022, 2023, and 2024 highlight a consistent pattern among key metrics that determine institutional prestige. The Academic Reputation Score, as the most weighted metric, displays a persistent skew towards a select echelon of universities, emphasizing the enduring recognition of established institutions. Variability in Employer Reputation and Faculty Student ratios across these years reflects evolving perceptions of graduate quality and educational resource allocation. The metrics for Research Impact and Internationalization, though varied, indicate a continuous commitment to global engagement and scholarly output. Collectively, these trends reaffirm the comprehensive criteria of the QS rankings and the sustained excellence among leading universities on a global scale.
To gain a deeper understanding of the global landscape of higher education as reflected in the QS World University Rankings, we employ choropleth maps to visualize the distribution of ranked universities by country for the years 2022, 2023, and 2024. This geographic analysis allows us to observe trends, patterns, and potentially the regional dynamics influencing higher education excellence on a global scale.
The function create_choropleth_map is crafted to:
Here's a brief overview of the function and its application:
import pandas as pd
import plotly.express as px
import plotly
#enable_plotly_in_cell()
def create_choropleth_map(dataframe, column_name, title):
# Generate a dictionary of value counts for the specified column
sample_data = dataframe[column_name].value_counts().to_dict()
# Convert the dictionary into a DataFrame
df_counts = pd.DataFrame(list(sample_data.items()), columns=['Country', 'University_Count'])
#print(df_counts)
# Create the choropleth map
fig = px.choropleth(df_counts,
locations="Country",
locationmode='country names',
color="University_Count",
color_continuous_scale=px.colors.sequential.Reds, # Reds color scale
title=title)
# Update the layout
fig.update_layout(
geo=dict(
showframe=False,
showcoastlines=False,
projection_type='equirectangular'
)
)
# Show the figure
fig.show(renderer="notebook")
# Use the function with your DataFrame and column
create_choropleth_map(df_2022, 'Country/Territory', 'Number of Universities per Country in 2022')
create_choropleth_map(df_2023, 'Country/Territory', 'Number of Universities per Country in 2023')
create_choropleth_map(df_2024, 'Country/Territory', 'Number of Universities per Country in 2024')
The choropleth maps for the QS World University Rankings from 2022 through 2024 consistently show that North America and Europe maintain a dominant presence with the highest number of globally recognized universities. This steadfast pattern underscores the concentration of academic prestige and resources in these regions. Despite the passage of time, the geographic distribution of leading institutions remains relatively unchanged, highlighting a persistent imbalance in global educational prominence. The continuity of this trend into 2024 further suggests that while there is global progress in higher education, efforts to diversify and enhance representation in the rankings could be strengthened to reflect a more inclusive global academic landscape.
Our excel files come from links below: